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INTRODUCTION 

Glassical psychometric theory is based on the notion that the 
purpose of educational and psychological assessment is to sort students 
or grade them from excellent to poor (Tyler and White, 1979). Recent 
developments and interest in adaptive instructional systems such as 
Individually Prescribed Instruction (Glaser, JI968), and minimum 
competency testing call for new procedures focusing on the evaluation < 
of individual performance in terms of mastery. A test is purposely 
constructed to, give scores that reflect what a student can or cannot 
do. Based on a student's observed test score, he or she is classified, in 
a simple two category case, in either the "mastery" or the- "non-mastery" 
group for a skill. For example, as a master he or she may proceed to the 
next unit or receive a diploma, and as a non-master he or she may receive 

remedial work. Decision procedures tend to /fall into two categories: 

(•.■ .. 

\ 

mastery status is granted if, either the subject's observed test score 
exceeds a minimum level, or the probability is reasonably high that his or 
•her true score is beyond a given standard. In both cases, the dividing 

c 

line Between masters and non-masters is called the cut-off score, 
mastery score, or criterion. In making decisions about an examinee's 
mastery status, how far the examinee is from the. cut-off score is of 
no concern. Instead, the main concern- is whether the examinee is 
above or below the cut-off score.* Therefore, one essential task in , 
competency testing is'to locate a valid cut-off score which will 
classify individuals into categories representing their/ true mastery 
status. ; • 
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Cut-Score Models 

At this, stage of development, the setting of a cut-off score on a mas- . 
tery test usually involves a consideration of one or more of the following 
elements: (1) .the distribution of observed test scores; (2) the type of 
mastery criterion used; (3) the level of acceptable risks of mis-classifi- 
cation; (4) the loss of functions of mis-classifications; and (5) the dis- 
tribution of true scores. 

Perhaps the most ad hoc method of .setting a .cut-off-scpre is to look at 
the distribution of observed scores and pass either some upper x proportion 
of the examinees or select a cut-off point 1 * at.somJ^ reasonable break in 
the* distribution (such as between two modes or above, or below one tail of 
a^vskewed distribution). Over a succession of test administrations, these 
procedures may "lead to impressions of expected performance and a substanti ve' 
feel for what such a eut-off score standard means. However, this method 
of setting a cut-score is basically a norm-referenced decision and actu- 
ally avoids the mastery/non-mastery decision problem. 

True mastery can only be determined in terms of a criterion which has 
been established on an empiHcal or a theoretical -basis or both* F'ir example, 
a theoretical criterion proposed by Nedelsky (1954) for multiple choice 
tests is established in the following manner: distractors which the lowest 
passing student should'be able to reject' are identified for each item and 

the reciprocal of the remaining distractors if the minimum passing level 

I - 

(MPL). A summation of these MPL's is a theoretical. m|ni mum passing score 
for the overall test. ' . 

Alternatively, one can identify a criterion such as observable success 
in a closely -related task and a cut-off-score can be chosen so that the ntsaber 



of m1s-class1f1cat1ons 1s minimized. Such m1s-class1f1cat1ons can be of 
two types: (1) false positives, reflecting those who are non-masters on 
the criterion but are classified as masters by the test; and (2) false, ! 
negatives, reflecting those who are masters on the criterion but who are- 
classified by the test as non-masters. If one uses observed scores and 
a criterion has been selected 1n terms of mastery ability e, where 0 <. e.<. 1, 
.one would want to adjust tjie cut-off score according to the level of accep- 
table risks associated with each of the two types of misclassiflcation.. 
For example, a school may be willing to admit non-masters to Its program— 
but only up to 10% of the overall enrollment—while it doe$ not wish jto turn 
away more than, say, 20% of the true masters who supply for participation. 
A qut-off score could then be chosen such that the compound binomial 
probability of mis-classification for a given ability parameter of true ✓ 
mastery would not exceed the established risk levels. A solution to this 

problem, of course, depends on having a sufficient number of test Items. 

■»•,..• ■ * 

Stig Fhaner (1974) poses the problem as follows. \ 
Find the critical score C such that 



(1) 





x=0 



where e = universe score definitely insufficient for passing 



e = universe score definitely sufficent for passing 



a = tolerable risk of accepting a non-master 
6 = tolerable risk of rejecting. a master 



n = number of test items 



x = observed score 



Related to these risk levels''are measures of loss associated with 
each type of m1s-class1f1cat1on. Losses can be specified In terms of 
time or costs. For example, the losses associated with admitting a non- 
master might be loss of training costs or time wasted 1n pursuing a non- 
successful endeavqr. Losses associated with rejecting a^rnaster might 
Involve postponement of societal benefits, loss of Institutional revenue, 
or time wasted on needless remedial training. If the losses can be speci- 
fied, then the mastery score problem becomes one' of finding that score 
which will minimize, them. Huynh (1976) Incorporates the probability of 
success on a referral task Into determining a rule allowing for an optimal 
decision. He specifies the Toss function' (R(C)) to be minimized as follows: 

• R(C) = Vx>0 C f ( ) } Ll- S ( 9 nP( 9 ) f (xl 9 )d xd9 + /j i / x>0 C s (8)S(9)p(.9)f(x|9)dXd9 



— 0 



where 

C f (e): loss of granting mastery status to a failure 

C s (9): loss of assigning non-mastery status to a success 

S(g): probability of success on a criterion 

f(x/9): probability density function of observed scores given 9_ 

9: universe score of ability 0<9<1 

c: Cut-Score £ 

P(9): probability density function of 9_ 

'Y. . . . < 

The minimization of the double integral and solutions for the cut- 

.«. •■ ^ 
score c cah be approximated 1f a beta distribution is assumed for the 

ability .9 arid the binomial distribution of observed scores is approximately 
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described by the normal distribution (large n and parameter e not near 
I or 0). Also, the loss ratio C $ /C f must be constant and the functions 
S(e) close to a 0-1 form. The solution can then be expressed as, 

c » (n+o+B-l)t + z/(n+ a Ts-i)t 0 (l-t 0 ) -a+.5 

where, / 

a, s: are parameters of the beta distribution 

t Q : the value of 9_ associated with true mastery 

z: 100/1+Q percentile of the unit normal distribution 

Q: C s /C f 

In summary, many different approaches to setting cut-off scores have 
• been advanced. The purpose of the present research was to compare the re- 
sults -derived from the various approaches. 

Applications of Models 

In order to illustrate several procedures for setting cut-off scores, 
and how various considerations may change the cut-off score value, a data 
set was obtained consisting of 99 foreign engineering graduate students' test 
scores on a sample 'of 87 items from the UCLA English as a Second Language 
. proficiency test, their GPA, the number of university courses failed, and 
GRE percentile scores (Table 1). Since the ESL test was administered to 
^determine if remedial English courses were required for successful perfor- 
mance in graduate work, GPA and number of courses failed were used as ex- 
„' ternal criteria of English mastery. However, it is acknowledged that, in 
addition to language profic*. y ; achievement in graduate work is highly 
dependent on other factors such as previous preparation in related work, 
amount of effort, quality of instruction. 

ERIC < ° - 
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TABLE 1 

Means and Standard Deviations of ESL Data 



Variable Name 


X 


0 


1. 


General GPA 


3.45 


0.38 1 


2. 


YRl GPA 


3,4,3 


0.38 


3. 


GRE Verbal 


15.55 


17.45 


4. 


GRE Quantitative 


* 88.89 

m 


9.74 


5. 
6. 


GRE Advanced 
* ESL Score." 


,58.55 
^99.84 


26.71 
157.50 



Norm-Referenced vs. Theoretical Criterion % t 

Based on the past few years 1 records, approximately 26 percent to 
30 percent of the students taking the ESL exam each year are declared 
proficient enough to take university courses without remedial English 

courses. For the 87 item test considered here, the upper 30th 4 percentile 

« 

corresponds to a test score of 69. This percentile score was based on 
a total of 1150 students, university wide, of which the 99 engineering gradu- 
ate students were a sub-group. Although no theoretical mastery cut-off v 
score is explicitly stated by the test-makers, it does appear that exemp- 
tion status is associated with at least the ability to answer 75 percent 
of the items correctly.' If such* a proportion of correct answers is used 
as the theoretical mastery criterion, then minimal competency 1s^ associ- 
ated with a score of 66 or above. These different ..criteria result in ' 
different classifications of mastery/non-mastery status according to 
normed placement' (cut-score, set as the 26th and 30th perceritfles) or 
theoretical criterion (Ta&le . 2. and Table 3). 



2 

Gross Tabulations of Mas.tery/Non-Mastlw 
by Normed, Placement (Upper 30th Percentile) 



ESL Normal 
Placement • 
Upper 30th 
Percentile 
' ,c*68 



Theoretical Criterion, 75th Percentile 
c»66 





Mastery 


non-Mastery 




Mastery 


31 


0 


31 


non-Mastery 


7 


61 


68 




a& 


61 


99 



TABLE .3 

Cross Tabulations of Mastery/Non-Mastery 
by Normed Placement (Upper 26th Percentile) 



ESL Normal 
Placement 
Upper 26th 
Percentile 
c=69 



Theoretical Criterion, 75th Percentile 
c=66 





Mastery 


non-Mastery 




Mastery 


29 


0 


29 


non-Mastery 


9 


61 


70 




38 


61 


99 
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Tht results of thest cross tabulations indicate that if the theoretic 
cal criterion were taken as tht true mastery standard, then mls»classiflea* 
tlon only occurred when true masters were put In the non«mast#ry category, 
Implying that a false-nagatlva type of error was seen as less serious # than 
passing a non-mastar Into mastery status, 

Cyt-Scpres Based on Acceptable R isks of M1s.clkei Heating 

In applying St1g Fhaner's method of Incorporating acceptable risk 
levels 1n the setting of cut-off seorws, the normal approximation was used 
to compute the cut^scores which would result 1n&<.01 and l«.10, Given 
that the length of the test 1s fixed at 87 Items and the & erwn mm be 
very small, then the cut-off score becomes a function of the value one uses 
for ability which 1s definitely sufficient for success or definitely In- 
sufficient for success. If one were to use .75 and .60 respectively for 
these values, then: , 

X.+.5 - 87(.75) 
(5 7 (.75)(.2B)) 1 ' m * *i " 59 - 58 

x 9 -. 5-87(,60) 

— = ■ 2.33 ■*• Xo " 43.34 

87(.60)(.4O) 

Since there 1s a discrepancy 1n the cut-off scores (x p x 2 ), then the only 
solution is either to increase the number of test items or relax'the^ risk 
level. If the a. level is relaxed to .05, then 1.645 is substituted for 2.33 
and x 2 is computed to be 60.21. This would result in a cut-off score of 
61 which corresponds to being able to answer over 70 percent of the items 
correctly. A cross-tabulation cable of the theoretical criterion of .75 
by. this risk-incorporated cut-off score is shown in Table 4. 



TABLE 4 

Cross Tabulations of Mastery/non-Mastery by Tolerable Risk 
Placement (a-, OS, 6*. 10) versus Theoretical Mastery Ability .75 



risk incor- 
porated 
» cut-score 
c-61. 



Theoretical Criteribn«75 Percent Items Correct 

c»66 





Mastery 


non-Mastery 




' Mastery 


38 


8 


46 


non-JMastery 


0 


53 


53 




38 


61 


99 



By/this standard. then, the number of false masters is Increased over the 
norm-referenced procedures and the number Of false non-masters goes to 
zero. HoWever, if o is. set to .01 and e 1s allowed to go to '.25, then 
the cut-off score would become 63 (see Table, 5). 



TABLE/5 



Cross Tabulations of Mastery/non-Mastery by Tolerable Risk 



Placement, (a-. 01, 6 s . 25) versUs Theoretical Mastery Ability .75 .. 



C=63 4 



theoretical Criter1on*75 Percent Items Correct 

c»66 





Mastery 


non-Mastery 




Mastery. 


38 


3 




non-Mastery 


- 0 


58 


58 




38 


— " ' "| — : 

■61, 1 


99 v 
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Since most of the students in the engineering sampfe .were exempted ' 
from English courses or only had to take one remedial' couVse, the distri- 
bution of ability is probably skewed. As a result,, there 1s still a 
greater number of false masters than false non-masters.' It is" clear, how- 
ever, that the types of mis-classification increase or decrease accord- 
1ng to how the risk levels are set. 
Huynh's Optimal Decision Rule Model 

An application, of Huynh's model (1976b) was applied to the data usir/g 
the approximation formula which assumes a, constant loss ratio and a 0-1 * 
referral success. The a, 6. parameters of thpbeta>istribution were esti- 
mated to be (Huynh, 1976a): ' , * 

* 

a ■ )u = 7.25 

21 i 



a - ■ n 
a 



n = 3.29 



21- 



Where , 



a 



21 



*2 



n 



59.84 
157.50 



1- . - /~N{ 

na 2 7 



When t Q , true mastery, is assumed to be 75 percent correct,, and the' loss" 
ratio is^one, tften, the cut-score with Huynh's model 1s 65 ,66. A comparison 
of classifications using the theoretical criterion and the cut-off score 
derived from Huynh's model is ^own in' Table 6. . 

TABLE 6 
Theoretical Criterfon 



Huynh' s 
Model. 
> c«66 





,c«6« 
Master 


non-Master 




Master 


38 


t I. - ■ 
0 


.38 


non-Master 


' 0 . 


61 


- 61 




38. " 




99 



In this situation-, where the probability of false positive and false „, 
negative mis-classification is assumed equal, no errors of classification 
are observed. If, however, classifying a failure as a success is twice 
as serious as a false non-master, a cut-off score of 67.48 or 68 'is found. 
The cross-tabulation would then be the same as Table 2 where the norm- 
' referenced cut-off score is used. 

Wilcox's Optimal Cut-Off Score Based on Observed Scores and an External 
Criterion 

— — — ■ p 

Wilcox (1979) proposed a procedure that simply classifies examinees 
into masters and non-masters on the basis of some external criterion and 
then fining the test cut-off score which minimizes the number of mis- 

. classifications. For example, If GPA were taken as. the external criterion, 
the classification of masters/non-masters would depend on the GPA needed 
to remain in good standing as a graduate student, namely a 3.25 or above. 
Plotting the various cut-off score possibilities along the X-axis and the 
number of classification errors on the Y-ax1s, a graph such as the one in 

, Figure 1 is obAined. The minimum number of mis-classifications occurs at 
a cut-off score >of 43. This same score is obtained when a similar graph 
-is drawn using the number of failed courses as the external criterion, ,and 
a non-master is defined as one who falls more than one course in the first . 
year of graduate study. 

. Figure 1 shows that the optimal cut-off score is considerably lower 

•» * • ■ 

than the cut-off scores of the other Illustrated methods, probably Indi- 
cating that the ppofldency test 1s best for determining the ml(vtmum language 
_/ standard needed for successful academic performance, whereas the.hlgh cut-off 



scores of the other methods are more concerned with a standard at which 
one is reasonably sure of successful performance. In fact, this inter- 
pretatj|on is fairlyconsistent with UCLA's remedial English placement prac- 
tices for foretgn students. The upper cut-off scor* of 68 is associated 
with exempting students from all ESL course^requirements, and a score of 
about 30 is associated with the heaviest ESL course requirements while 
still allowing enrollment in regular university classes. 



'«*■!> 
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FIGURE 1 

Graph of Mlsclasslflcatlon by different cut-scores 
Criterion: Overall GPA 
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Wilcox's Method for Approximating True Score Distribution 

Methods proposed to approximate true score distributions can also 
be used to examine the problems of setting, cut-off scores. Let 6 be the 
percent correct true score of an examinee, x be an observed score hav- 
ing. as possible values 0, U 2, ... n, where n 1s the number of dichoto- 
mousiy scored items on a test, and f(xje) be the conditional probability 
density function of true scores over a population o,f examinees. Kea^s 
and Lord ( 1 962) proposed a strong true-score model .based on the assump- 
tion that f(x|e) 1s the blnoml a V probability function 



* 4 

(3) l^d-e)"^ 



It 1s^ further assumed that the distribution of e over the population of 
examinees 1s given by 



s-1 



where T 1s the usual! gamma function and where r and, s are unknown parameters 
that can be estimated via the examinees' observed test scores. This Is the* 
family of beta distributions that 1s typically used 1n conjunction with (3). 

Wilcox (1979) suggests replacing (4J with a more general, family of 
distributions given by " * " * 

< 5 > W^*, ^Isr^- 1 ^) 5 - 1 



where x, r and s are unknown parameters that are estimated using observed 
test scores. This is the family of non-central beta distributions which 
contains the family of beta distributions (x»0)as a special case. 

The motivation for (5) 1s that we obtain a better approximation to 
g(e) which Iriturn can have an effect on the choice of a passing score. . 

Using Wilcox's method, we need only the first three moments of the 
true score distribution in' order to approximate A, r and s. The number 
o£ examinees receiving an observed score of x on the 87 item, ESL test 
1s presented 1n Table 7. 

' ' TABLE 7 
Frequency Distribution of Td'tjal Scores 
on the ESL Test 
; ■ ' N»99 



-'Total Test 


Frequency 


Total Test . 


Frequency 


Score 




Score ' 


17 


• i - 
1 ; 


61 


3 - 


25 


- 1 


62 ; 


2 


*>'•■ 35 


1 


63 


2 . 


42* 


2 


65 


1 


• 43 


\ 1 


66 


2 


44 ' 


v 2 


67 




• ,45 


3 


68 


' I 


■* 46 


2 


° 69 


2 


• 48.\ 


2 


' 70 


4 ' 


4 


71 


2 


. 49 


3 


72 


• 3 


50 " 


3 • 


73 


4 


41 


3 


74 


1 


53 


- 5 


75 • 


3 


.54 


1 


76 " 


1 


55 


3 


77 


3 


56 


-2 


78 


1 


57 


5" 


... 80 


3 


58 


3 


81 


1" • 


59 


4 


. 84 


1 


60 


2 





J^he first three moments of the true score distribution were estimated to be 
.'688, .491 and .362 respectively. 0 



Setting x=»0 and using the method of moments, we estimate r and s-with 



- u. 



+ Uj - 1 



(e.g., Huynh, 1976; Wilcox, 1977) yield r*7.784 and s*3.533. From standard 
results on the beta distribution, these values of r, s and X imply that 
w^.eSS, u 2 =.499, p 3 =.360. In order to find the best estimates of y , 
u 2 , u 3 , different x values are estimated and presented in the following 
table (Tsble 8): 

TABLE 8 

Estimated Values of the First Three Moments * 





Using AU615EQ 


Usinjg AUG15uu 




r 


s 


>i 






X- 0 


7.7838 


3.5331 


.6878 


.49051 


.•.36038 


X« .3 


6.9031 


3.2895 


.68780 


.49227 


•.36308 


X» .4 


6.9196 


3.3068 


.6878 


.49188 


_i 36303 


X» .5 


7.1440 


3.4109' 


.68781 


.49179 


.36173 


X*1.0 


6.4792 


3.3560 


.6878 


.49207 


.36335 


X»2.0 


" 6. .3069 


3.6992 ' 


.6878 


.49101 


.3612.4 


X-3.0 


,5.4857 


3.7448 


• .6878 - 


.49144 


.36199 



Notice that for x equals 3 *nd solved for f and s yielding r» 5.4857 
and s-3.7448 - These values of r, s and x imply that u 1 ".6878, y 2 «.4914 and 
w-«.3620. Thus, these values of r, s, and x are in reasonably" good agreement 

9 . > ■»'■. '•• 



with the estimated values of v , y and v .Assuming these approximations 
to the true score distribution g(e), the probability of committing a 
false-positive (A) and false-negative error (B) can thus be estimated 
using: 



n 100 • 

A= I Z — — 



x»x 0 j*p 



J! 



'nj r(r+s+j) ^ ; x+r+j-l (j.^n-x+s-l 



x »0 j«o J J W J , 



When the cut score is set at 66 on this 87 item ESL Test, the probabilities 
Of committing a false positive and false. negative error are .010 and .152 
respectively. When the cut-off score is set at 65, the probabilities are .015 
and .126, Therefore, the total probability of mis-el assificatlon 1s less than 
when the" ciit-pff score 1s 66. Using 42 and 43 as the cut-off scores, as com- 
puted based on Wilcox's method of choosing an optimal passing score with 
an external criterion,.. the probability of Type A error 1s .408 and .397 
respectively^ and Ty Pe B errors become minimal, <2Q2E'6 and .548E-6.' 

Discussion and Recommendations 

Since the'purpose of the ESL test is to identify students who lack 

the language skill required to go through graduate school successfully, it 

" w . '■ ' ■ • . -'" , 

appears *^that a number of other factors are also needed to be considered 

1n selecting a cut-off score. The first factor— which has been the major 

consideration for all illustrated methodsr-is the loss associated .with 

m1s-cl ossification. Millman (1973) stated that although there are multiple t 

methods for setting cut-off scores, none of them eliminates the element'of 
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judgment that occurs at some stage of their execution; this statement is 
still true. Recent developments on the topic of standard setting, however, 
enable us to make more Informed decisions. How much risk are we willing 
to take? (Very little? Ten percent? Fifty percent?) What type of risk 
are we more willing to take? (Promote students who have attained profi- 
ciency?) Depending on the levels of risk one is willing to take, a dif- 
ferent cut-off score can be chosen accordingly. 

Another factor of concern is the predictive and construct validity 
of the test content with respect to the chosen external criteria. The 

♦ 

intercorrelatlon between the ESL test score- and overall GPA 1s .22 
(Table 9)', and it is^ slightly more positively correlated with the first 
year's GPA. This finding Is expected since, after an initial stage, stu- 
Uents all acquire a certain level of proficiency *1n English. The over- 
all §PA, as well as first yeaf GPA, shows the highest correlation With scores 
on the Advanced Graduate Repord Examination, which is an achievement test. , 

" TABLE 9 

Correlation Matrix of ESL Data • , 





1 

Overall 
j§ PA 


2- 

Year 1 
GPA 


3 
GRE 
Verbal 


4 
GRE 

Quantitative 


5 6 
GRE 

Advanced, ESL 


1 


* 1.00 - 










2 


.98 


1.00. 








3 


.18 


.20 


- 1.00 






4> 


.. .38 


# .34 


.20 


; 1.00 




5 


.57 


.55 


.23 


j .51 


' K00 


6 


.22 


.25 


.33 


1 .57 


.27 r.oo 



The multiple correlation coefficient of scores on the advanced GRE and ESL 
with overall GPA was .56 (R^*. 31). The relatively low correlation between 
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the ESL test and performance in graduate study may indicate that for 
Engineering majors, the skills tested by the *ESL test have a low impact 
on achievement. Therefore, a lower cut-off score, such as 42 or 43, may 
serve screening purposes adequately. By studying the relationships be- \ 
tween English competency and performance in subject areas for various 
fields of study, e.g., the humanities, sciences, social sciences, we may 
decide that different cut-off score's are needed to. insure a given level of 
risk. The problem then becomes one of gathering the' appropriate data 
to obtain estimates for the parameters used in the various cut-off score 
models. No matter how sophisticated these models may be in desprlblhg 
such things as 'a true score distribution, the decision makers must. still 
take into account substantive issues unique to their own applications of 
the models. ,'" * . 
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